SQL query

Back to DuckDB Data Engineering Glossary

Definition

A SQL query is a structured request written in Structured Query Language (SQL) that allows you to retrieve, analyze, or manipulate data stored in a database. The most common type of query uses the SELECT statement to read data, though queries can also modify data using statements like INSERT, UPDATE, and DELETE.

Basic Structure

A typical SQL query starts with SELECT followed by the columns you want to retrieve, then FROM to specify the source table(s), and optionally WHERE to filter results. Additional clauses like GROUP BY, HAVING, and ORDER BY help shape and organize the output.

DuckDB Examples

Here's a simple query that selects all columns from a table:

Copy code

SELECT * FROM employees;

A more complex query showing filtering, grouping and ordering:

Copy code

SELECT department, COUNT(*) as employee_count, AVG(salary) as avg_salary FROM employees WHERE status = 'active' GROUP BY department HAVING COUNT(*) > 5 ORDER BY avg_salary DESC;

DuckDB extends standard SQL syntax by allowing you to omit the SELECT keyword when using *, making this valid:

Copy code

FROM employees;

You can also use DuckDB's unique column selection syntax to dynamically select columns:

Copy code

SELECT COLUMNS('salary_.*') FROM employees;

Common Use Cases

Data analysts and engineers use SQL queries to perform tasks like generating reports, analyzing trends, cleaning data, and integrating different data sources. Queries can range from simple one-line statements to complex operations involving multiple joins, subqueries, and window functions.

DuckDB-Specific Features

DuckDB optimizes queries for analytical workloads and supports querying data directly from files like CSV and Parquet without first loading them into tables. It also includes extensions to standard SQL that make common analytical tasks more convenient, such as simplified grouping syntax and enhanced column selection capabilities.